Reducing Downtime Due to System Maintenance and Upgrades
نویسنده
چکیده
Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating system updates while preserving application service availability. AutoPod provides a group of processes and associated users with an isolated machineindependent virtualized environment that is decoupled from the underlying operating system instance. This virtualized environment is integrated with a novel checkpoint-restart mechanism which allows processes to be suspended, resumed, and migrated across operating system kernel versions with different security and maintenance patches. AutoPod incorporates a system status service to determine when operating system patches need to be applied to the current host, then automatically migrates application services to another host to preserve their availability while the current host is updated and rebooted. We have implemented AutoPod on Linux without requiring any application or operating system kernel changes. Our measurements on real world desktop and server applications demonstrate that AutoPod imposes little overhead and provides sub-second suspend and resume times that can be an order of magnitude faster than starting applications after a system reboot. AutoPod enables systems to autonomically stay updated with relevant maintenance and security patches, while ensuring no loss of data and minimizing service disruption.
منابع مشابه
Reducing Downtime Due to System Maintenance and Upgrades (Awarded Best Student Paper!)
Patching, upgrading, and maintaining operating system software is a growing management complexity problem that can result in unacceptable system downtime. We introduce AutoPod, a system that enables unscheduled operating system updates while preserving application service availability. AutoPod provides a group of processes and associated users with an isolated machineindependent virtualized env...
متن کاملMetaMorphMagi: From Offline to Online Software Upgrades in Large-Scale IT Infrastructures
Software upgrades are one of the leading causes of downtime in IT infrastructures. Long running datamigration processes require intensive up-front preparation, extended maintenance windows and close monitoring, and they impose a significant burden on the system administrators. Even worse, major upgrades sometimes fail due to complex, hidden dependencies within the system, causing unplanned down...
متن کاملWhy Do Upgrades Fail And What Can We Do About It? Toward Dependable, Online Upgrades in Enterprise System
Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgrade-centric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approach...
متن کاملWhy Do Upgrades Fail and What Can We Do about It? Toward Dependable, Online Upgrades in Enterprise Systems
Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgradecentric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approache...
متن کاملAn application of artificial neural network to maintenance management
This study shows the usefulness of Artificial Neural Network (ANN) in maintenance planning and man-agement. An ANN model based on the multi-layer perceptron having three hidden layers and four processing elements per layer was built to predict the expected downtime resulting from a breakdown or a maintenance activity. The model achieved an accuracy of over 70% in predicting the expected downtime.
متن کامل